Visualization system for debug or performance analysis of soc systems

ABSTRACT

A selection associated with a desired set of visual information, associated with a system on chip (SOC) that includes a hardware functional module and a firmware functional module, is received. A template is selected from a plurality of available templates based at least in part on the selection associated with the desired set of visual information. The selected template is used by the SOC to generate reported information, including by configuring the hardware functional module, as prescribed by the selected template, to generate select hardware-reported information and configuring the firmware functional module, as prescribed by the selected template, to generate select firmware-reported information. The reported information is received and the desired set of visual information is displayed.

CROSS REFERENCE TO OTHER APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.17/858,443 entitled VISUALIZATION SYSTEM FOR DEBUG OR PERFORMANCEANALYSIS OF SOC SYSTEMS filed Jul. 6, 2022, which claims priority toU.S. Provisional Patent Application No. 63/222,264 entitled PERFORMANCEVISUALIZATION SYSTEM filed Jul. 15, 2021, each of which is incorporatedherein by reference for all purposes.

BACKGROUND OF THE INVENTION

Unexpected behaviors in embedded and/or system on chip (SOC) systems arenotoriously difficult to debug given the real-time nature and thecomplexity of such systems. Debugging techniques that work with othertypes of systems (such as invasive debug probes added at key locationsin non-real-time systems, “brute force” debug in simple systems, and theaddition of probes at easily accessed points in distributed systems) arenot easily portable and/or feasible with some embedded and/or SOCsystems. New techniques that provide better tools and/or techniques foranalyzing, debugging, and/or monitoring such systems would be desirable.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the invention are disclosed in the followingdetailed description and the accompanying drawings.

FIG. 1 is flowchart illustrating an embodiment of a process to displayvisual information using hardware-reported information andfirmware-reported information from an SOC system.

FIG. 2 is a system diagram illustrating an embodiment of an SOC systemwith reporting modules in each functional module to collect and storestatus information.

FIG. 3 is a diagram illustrating embodiments of a time-based message, anevent-based message, and a timestamp message.

FIG. 4 is a diagram illustrating two embodiments of an aggregated andtimestamped message stream in a message capture memory.

FIG. 5 is a diagram illustrating an embodiment of a Flash storagecontroller that is implemented on a SOC system.

FIG. 6 is a diagram illustrating an embodiment of performance-relatedvisual information associated with read and write operations for a Flashstorage controller.

FIG. 7 is a diagram illustrating an embodiment of a zoomed-in windowshowing information from event-based messages that are reported by anNVMe functional module in a Flash storage controller.

FIG. 8 is a diagram illustrating an embodiment of a process to generatevisual information using a latency.

FIG. 9 is a diagram illustrating an embodiment of visual informationshowing gap values associated with a front-end, a middle-end, and aback-end of a Flash storage controller.

FIG. 10 is a diagram illustrating an embodiment of visual informationshowing workload metrics for a Flash storage controller.

FIG. 11 is a diagram illustrating an embodiment of visual informationshowing bus utilization for one channel in a Flash storage controller.

FIG. 12 is a diagram illustrating an embodiment of visual informationshowing die access information for a Flash storage controller.

FIG. 13 is a diagram illustrating an embodiment of visual informationshowing statistical latency information for a Flash storage controller.

FIG. 14 is a diagram illustrating an embodiment of visual informationshowing bus utilization information for all four channels in a Flashstorage controller.

FIG. 15 is flowchart illustrating an embodiment of a process to displayvisual information using hardware-reported information andfirmware-reported information generated using a template from avisualization system.

DETAILED DESCRIPTION

The invention can be implemented in numerous ways, including as aprocess; an apparatus; a system; a composition of matter; a computerprogram product embodied on a computer readable storage medium; and/or aprocessor, such as a processor configured to execute instructions storedon and/or provided by a memory coupled to the processor. In thisspecification, these implementations, or any other form that theinvention may take, may be referred to as techniques. In general, theorder of the steps of disclosed processes may be altered within thescope of the invention. Unless stated otherwise, a component such as aprocessor or a memory described as being configured to perform a taskmay be implemented as a general component that is temporarily configuredto perform the task at a given time or a specific component that ismanufactured to perform the task. As used herein, the term ‘processor’refers to one or more devices, circuits, and/or processing coresconfigured to process data, such as computer program instructions.

A detailed description of one or more embodiments of the invention isprovided below along with accompanying figures that illustrate theprinciples of the invention. The invention is described in connectionwith such embodiments, but the invention is not limited to anyembodiment. The scope of the invention is limited only by the claims andthe invention encompasses numerous alternatives, modifications andequivalents. Numerous specific details are set forth in the followingdescription in order to provide a thorough understanding of theinvention. These details are provided for the purpose of example and theinvention may be practiced according to the claims without some or allof these specific details. For the purpose of clarity, technicalmaterial that is known in the technical fields related to the inventionhas not been described in detail so that the invention is notunnecessarily obscured.

Various embodiments of a visualization technique and/or system toanalyze, debug, and/or evaluate an embedded and/or system on chip (SOC)system are described herein. As used herein, the terms “embedded system”and “SOC (system)” are used interchangeably. As will be described inmore detail below, such visualization systems may eliminate the need forexpensive analyzers that may be difficult to attach to a SOC system;such visualization systems may also offer features, tools, and/oroperations (e.g., which offer better insight into the inefficienciesand/or errors in the SOC system) that are not supported by existinganalyzers.

FIG. 1 is flowchart illustrating an embodiment of a process to displayvisual information using hardware-reported information andfirmware-reported information from an SOC system. In some embodiments,the process is performed by a visualization system. In some embodiments,a visualization system is implemented using a visualization program thatruns on a (e.g., general-purpose) computer. A visualization program maybe implemented on an application that is installed on the computer,using a web-based application accessible via an Internet browserapplication, using a computer program product embodied on a computerreadable storage medium, etc.

At 100, reported information from a system on chip (SOC) is received,wherein the reported information includes: (1) hardware-reportedinformation that is reported by a hardware functional module included inthe SOC and (2) firmware-reported information that is reported by afirmware functional module included in the SOC.

In one example of step 100, the reported information from the SOC systemis first stored on some storage system or storage media that is externalto both the SOC system and a computer on which a visualizationapplication runs. Using the computer's built-in interface(s), thereported information is retrieved from the (external) storage system orstorage media. Alternatively, in some embodiments, the reportedinformation is directly passed from the SOC system (that generates thereported information) to the visualization system.

At 102, one or more display settings are received. At 104, visualinformation is generated based at least in part on: (1) the one or moredisplay settings, (2) the hardware-reported information, and (3) thefirmware-reported information.

At 106, the visual information is displayed. For example, a built-indisplay of a computer (on which a visualization program is running) maybe used to display the visual information. Some example screenshots aredescribed in more detail below.

In one example of steps 102, 104, and 106, display settings associatedwith a sequence of “zoom ins” are received and corresponding zoomed-invisual information is generated and displayed in response. An initialset of display settings may be obtained (e.g., stored in thevisualization system) that is used to generate visual information for astart screen or an initial display. This start screen (at least in thisexample) is at the highest (e.g., hierarchical) level of the SOC, withthe available functional modules (e.g., that were configured to reportstatus and/or event information). For example, graphical user interfacedisplaying such a start window may include a first window with a list ofthe (e.g., available, reporting, etc.) hardware and firmware functionalmodules and another window may include (as an example) correspondinglatency scatter plots associated with (e.g., high-level) operations forthose functional modules.

A subsequent zoom-in instruction or interaction produces (at least inthis example) a zoomed-in screen with performance metrics of theoperations within the zoom range and a plot of the latency of theassociated operations. For example, one window in the display may be anoperation metrics table, summarizing one or more (e.g., performance)metrics associated with one or more operations. There may also be awindow showing the underlying data used to calculate a given metric fora given operation. For example, the metric may be “average latency” andoperations may be “read from memory” or “write to memory;” one windowshows the data used to calculate the average latency for a readoperation (e.g., in the form of a latency scatter plot) and anotherwindow shows the average latency for a write operation.

A further zoom in may present information at the lowest and/or mostdetailed level, such as operation metadata, a size of the operation or asize of a piece of data associated with the operation (e.g., the size ofdata being written, read, transmitted, received, transformed, etc.), thetype of operation, settings associated with an operation, taginformation, and used resources (e.g., from shared resources such asshared buffers, shared channels, shared RAID resources, etc.). To put itanother way, the information displayed at the lower (lowest) levels mayinclude values from the various fields in various messages (see, e.g.,FIG. 3 ) that the (relevant) hardware functional modules and/or firmwarefunctional modules generated and that are associated with the (e.g.,selected) operations and/or events of interest.

In some embodiments, the visual information (e.g., generated at step 104and displayed at step 106) is in the form of a 2D or 3D display. In oneexample, a workload view provides a user with a view of transactionsand/or operations performed over time. In one 2D example, line plots of(e.g., performance) metrics are shown where the x-axis is time and they-axis is the (e.g., performance) metric. In one 3D example, the x-axisis commands (e.g., number of commands), the y-axis is time, and thez-axis is size (e.g., of a given command or some of data associated withthe command). The time values for these displays may be obtained fromtimestamps in messages (see, e.g., the timestamp field (330) in theevent-based message (320) as well as the payload (344) in the timestampmessage (340) in FIG. 3 ).

Before describing various features and/or embodiments of thevisualization system in more detail, in may be helpful to give someexamples of an SOC system that generates reported information (e.g.,received at step 100) and that is analyzed by a visualization system;more detailed examples of reported information may also be helpful. Thefollowing figures describe some example SOC systems and some examples ofreported information.

FIG. 2 is a system diagram illustrating an embodiment of an SOC systemwith reporting modules in each functional module to collect and storestatus information. In this example, the exemplary SOC system (200)includes a plurality of functional modules (202). In variousembodiments, the exemplary SOC system (200) is an application-specificintegrated circuit (ASIC), a field-programmable gate array (FPGA), etc.In various embodiments, a functional module (202) is a hardware modulecomposed solely of electronic circuits, a firmware module composed ofinstruction code operating on a processing unit, etc.

In this example, the ability to analyze (e.g., in real-time or aftersome wait) the behavior and/or operation of each of the plurality offunctional modules (202) is desired. For example, each of the functionalmodule may contribute to the critical processing or movement of data,making subsequent analysis and/or review highly desirable. To that end,each functional module (202) has a sub-component, referred to in thisfigure as a reporting module (204), that gathers status information fromwithin its functional module, encapsulates that status information in astandardized message format, and sends the encapsulated information on adedicated link (206) to a central message gathering module, referred toin this figure as an aggregation module (208). In some embodiments, lesscritical and/or less interesting (from a debug perspective) functionalmodules do not have a reporting module.

In this example, each reporting module (204) includes one or morereporting rules (not shown) which describes the conditions under whichtime-based messages and/or event-based messages are generated and sent.These reporting rules may also describe what specific registers, values,nodes, states, etc. should be included or otherwise used as the statusinformation that is sent to the aggregation module (208). For example,each reporting module (204) may include a controller that compares thecondition(s) specified in the reporting rules against the relevantvariables, states, events, etc. in that particular functional module.

In this example, messages are received from the various links (206) bythe aggregation module (208) and are aggregated into a single,aggregated stream. A timestamp is then inserted into the aggregatedstream to obtain a timestamped and aggregated stream. Although thisexample describes aggregating first and then timestamping, in someembodiments that order is reversed.

The timestamped and aggregated stream is passed from the aggregationmodule (208) to the memory (210). From the memory (210), the timestampedand aggregated stream is transported out of the SOC system (200) via astorage interface (216) so that the information can be exported off-chipto a (external) storage medium (218), such as Flash and/or solid statedrive (SSD) memory. For example, this testing infrastructure was firstprototyped and/or implemented on a storage controller and therefore thestorage interface (216) was already implemented. To put it another way,in some embodiments, the storage interface (216) is a “production”interface that is/was already implemented and/or is used by the routine,non-debug-related operations supported by the SOC (200).

From the (external) storage medium (218), a visualization system (214)is configured to ingest, filter, display, and/or analyze the timestampedand aggregated stream that is obtained from the storage medium (218), asinstructed by a user.

In one specific example, the SOC (200) (i.e., the device which isanalyzed using the visualization system (214)) is a Flash storagecontroller and the storage medium (218) is the Flash storage medium thatis being managed and/or controlled by the Flash storage controller. Inthis application, since there is a storage medium (218) that is readilyavailable and the SOC (200) controls writing to that storage medium(e.g., so there is no concern about any captured information beingaccidentally overwritten by another device), the information in thememory (210) is sent off-chip via a storage interface (216). If or whenanalysis by a visualization system is desired, the storage medium (216)may be accessed by the information processor to retrieve the (e.g.,debugging) information stored therein.

The following figure describes some examples of standardized messageformats that may be used by a reporting module (204).

FIG. 3 is a diagram illustrating embodiments of a time-based message, anevent-based message, and a timestamp message. These are some examples ofa standardized message format in which status information may beincluded. These messages may be then aggregated and timestamped tocreate an aggregated and timestamped stream which is stored, and which avisualization system may ingest in order to analyze the SOC system thatgenerated the messages. In FIG. 2 , the reporting modules (204) may useone or more of the exemplary standardized message formats shown here tosend status information to the aggregation module (208). In variousembodiments, a system may support any number of standardized messageformats (i.e., message types).

In this example, three message types are shown which are identified by atype field (322) in a fixed location in the message, which in thisexample is the first three bits of each message. The first message typein this example a time-based message (300) where the first field(comprising 3 bits) is the type field (302), with a value thatidentifies the message as a time-based message.

The type field (302) is followed by a 3-bit sequence field (304). Thisfield is a time-based sequence identifier that is included for caseswhere the operating frequency of the given functional module is greaterthan the link frequency between a given reporting module (e.g., one of204 in FIG. 2 ) and the aggregation module (e.g., 208 in FIG. 2 ). Forexample, this is a common occurrence when the interconnect between theports and the aggregation modules operates at (as example) 25% of thenominal system operating frequency, which (as described above) conservespower. With the system so configured, messages may occasionally bedropped because the rate of production (e.g., by the reporting module)is greater than the rate of uptake or transport (e.g., by theaggregation module). The condition is detected and post-processed as anon-uniform increment of the sequence identifier of two adjacentmessages. In one example, the sequence field (304) is incremented by oneeach time a new time-based message is sent so that if there a jump oftwo or more, then the post-processor knows some messages have been lost.In some embodiments, the sequence number (304) is based on and/orincludes some bits from the timestamp that is used for the timestampmessage (340).

The next field in the time-based message (300) is the payload field(306) which has 10 bits. For example, a time-based reporting rule mayspecify what status information to include in the payload field (306) ina time-based message (300). In one example, time-based messages (300)are sent that include the value or state of a finite state machine when(or while) the finite state machine is not in the idle state. Thesetime-based messages (300) continue to be periodically sent until thefinite state machine returns to the idle state, at which pointtime-based message generation stops (at least in this example).

The second type of message in this example is the event-based message(320). As with the other message types, the first field is the 3-bittype field (322), where the value identifies the message as anevent-based message.

Next are a 13-bit tag field (324) and 3-bit sub-type field (326). Thesub-type field (326) is sometimes referred to as an event identifierfield because the various events that are captured and reported by anevent-based message are each assigned an event number or identifier. Inthis example, because the sub-type (i.e., event identifier) field (326)has 3 bits, the event identifiers range from 0 to 7, inclusive.

For some functional modules, having eight event identifiers issufficient to uniquely identify all events for which recording isdesired. However, some functional modules support many different typesof operations (e.g., transmit as well as receive), have manyintermediate events of interest between a start event and an end event,and/or have a control channel and a data channel (each with its ownassociated events). Therefore, in some cases, eight event identifiers isinsufficient to uniquely identify all of the events for which reportingis desired. In this example, to accommodate such situations, some eventsshare an event identifier value and some portion of the tag field (324)is used to distinguish between the events that share an event identifiervalue. In cases where an event identifier value is unique (i.e., it isassociated with only one event), the tag is used to transmit otherinformation, such as additional status information (e.g., per theinstructions in the relevant event-based reporting rule).

Next in the event-based message (320) are two reserved fields. The firstreserved field (328) is a 5-bit space reserved for replacement with anidentifier of the message-producing functional module. For example, inFIG. 2 , a functional module (202) would generate event-based message(320) but leave this first reserved field (328) blank. At theaggregation module (208), the first reserved field (328) is filled in bythe aggregation module (208) with the identifier associated with thefunctional module (202) from which the event-based message (320) wasreceived (e.g., known because the links (206) are dedicated, notshared).

The second reserved field (330) is an 8-bit field that is reserved forreplacement with a timestamp. The event message timestamp (330) servesthe same purpose as the time-based sequence identifier (304) in thetime-based message (300). However, because event-based messages occurmore sporadically than time-based messages, they need a larger field tocapture a larger time difference between event-based messages (320) orbetween an event-based message (320) and a timestamp message (340). Aswith the first reserved field (328), the second reserved field is leftblank by a reporting module (e.g., 204 in FIG. 2 ) and/or functionalmodule (e.g., 202 in FIG. 2 ) and is filled in at the aggregation module(e.g., 208 in FIG. 2 ).

The last field in the event-based message (320) is a 32-bit payloadfield (332), used to carry the payload associated with the relevantevent. The specific (status) information that is included in the payload(332) is specified by the relevant event-based reporting rule.

Conceptually, event-based reporting (e.g., using an event-basedreporting rule) may be thought of as a (more) sparse reporting techniquebecause only one event-based message is generated when a condition isdetected. In contrast, time-based reporting (e.g., using a time-basedreporting rule) is a (more) dense reporting technique because time-basedmessages are continuously generated while the condition is satisfied. Assuch, time-based reporting will tend to generate many more messages thanevent-based reporting. To account for this difference, the size of theexemplary time-based message (300) is smaller than the size of theexemplary event-based message (320) in this example (e.g., 16 bits forthe exemplary time-based message (300) vs. 64 bits for the exemplaryevent-based message (320)).

The third message in this example is a timestamp message (340) whichincludes a 3-bit type field (342) to identify the message as a timestampmessage and a 29-bit payload field (344) which is used to store thevalue of the timestamp. In some embodiments, the timestamp that iswritten into the second reserved field (330) of the event-based message(320) is the lowest eight bits of the longer timestamp that is includedin the payload (344) of the timestamp message (342).

In this example, the timestamp message (340) is not transmitted acrossthe message interconnect (e.g., 206 in FIG. 2 ). Rather, timestampmessages in this example are generated and inserted by the aggregationmodule (e.g., 208 in FIG. 2 ) into the stream written to the messagecapture memory (e.g., 210 in FIG. 2 ) for accurate representation,should the message stream be interrupted for any reason (e.g., temporarymemory unavailability).

As shown in this example, in some embodiments, timestamping (e.g., atstep 106 in FIG. 1 ) includes inserting a timestamp message (e.g., 340)into an aggregated message stream.

This example also shows that in some embodiments, timestamping (e.g., atstep 106 in FIG. 1 ) includes writing a timestamp into a reserved field(e.g., 330) in the standardized message format (e.g., 320) that isreserved for the timestamp.

The message sizes and formatting illustrated herein are merely exemplaryand are not intended to be limiting. In various embodiments, differentmessage sizes, field sizes, and/or field locations may be used.

As is shown in FIGS. 2 and 3 , in some embodiments, the reportedinformation (e.g., received at step 100 in FIG. 1 ) includes atimestamped and aggregated message stream (e.g., sent from theaggregation module (e.g., 208 in FIG. 2 ) to the memory (e.g., 210 inFIG. 2 ) and the timestamped and aggregated message stream is generatedby an aggregation module (e.g., 208 in FIG. 2 ) included in the SOC(e.g., 200 in FIG. 2 ) that is configured to receive hardware eventinformation in a standardized message format (see, e.g., event-basedmessage (320) in FIG. 3 ) from the hardware functional module (e.g., oneof functional modules (200) in FIG. 2 ), receive firmware eventinformation in the standardized message format (see, e.g., event-basedmessage (320) in FIG. 3 ) from the firmware functional module (e.g., oneof functional modules (200) in FIG. 2 ) and aggregate and timestamp thehardware event information in the standardized message format and thefirmware event information in the standardized message format to obtainthe timestamped and aggregated message stream.

In some embodiments, timestamping (e.g., hardware event information inthe standardized message format and/or firmware event information in thestandardized message format to obtain the timestamped and aggregatedmessage stream) includes writing a timestamp into a reserved field inthe standardized message format that is reserved for the timestamp.

In the example of FIG. 3 , the sizes of the three exemplary messagetypes (300, 320, and 340) are carefully selected so that in the capturememory (e.g., 210 in FIG. 2 ), the messages can be uniformly interleavedon message boundaries to prevent partial messages should messageoverwriting occur. The following figure shows an example of this.

FIG. 4 is a diagram illustrating two embodiments of an aggregated andtimestamped message stream in a message capture memory. In this figure,a first memory segment (400) and a second memory segment (420) show twoexamples of aggregated and timestamped message streams that are storedin message capture memory. For example, these two memory segments (400and 420) show examples of memory (210) in FIG. 2 .

In the first memory segment (400), a 32-bit timestamp message (402)occurs first, occupying one half of a 64-bit slot of memory; theremainder of the slot is unused. In this example, each slot of memory is64 bits and the timestamp message (402) has the example size and formatshown in FIG. 3 .

Next, three 64-bit event messages (404) each occupy one memory slot sothat the second, third, and fourth slots are occupied by event messages.

Then, a sequence of 13 time-based messages (406) is stored. Eachtime-based message (406) is 16 bits long so the fifth, sixth, andseventh slots each have four time-based messages per slot. In thisexample, time-based messages are used to capture the state or value of afinite state machine while it is not in the idle state (e.g., from thetime it leaves the idle state until the time it returns to the idlestate). As such, in this example, the term “state capture” is used todescribe the time-based messages (406) but in other embodimentstime-based messages are used to capture or record other types ofinformation in a functional module.

The last slot is occupied by an event-based message (408).

In this example, each event-based message (e.g., 404) occupies a singleslot whereas time-based messages (e.g., 406) are written four to a slot.The advantage of keeping the sizes and (slot) offsets as shown here isthat less overhead information needs to be saved which would be requiredif (as an example) the event messages could start at any offset within aslot instead of at a zero offset. Similarly, if the message sizes werenot multiples of each other and the messages did not align with theslots as shown here, then mixing the two messages randomly would make itvery hard to distinguish between message boundaries when an old messageis overwritten with a new message, creating partially messages. Partialmessages can have missing type identifiers (to distinguish the messagetype) and/or missing message content (making it hard to identify the endof the message).

It is noted that the various event messages (404 and 408) and statecapture (i.e., time-based) messages (406) may be from a variety offunctional modules and are not necessarily from a single functionalmodule. Rather, they are aggregated at the aggregation module in theorder in which they are received.

A downside to embodiments that generate the first memory segment (400)shown is that supporting multiple message types (each having a differentmessage size) adds to the complexity of the aggregation module. Forexample, to ensure that the last event message (408) starts at thebeginning of the last slot, the aggregation module has to track thenumber of preceding state capture (i.e., time-based) messages (406) sothat if the number is not a multiple of four, one or more unused fourthslots are inserted before the last event message (408). In someapplications, a less complex implementation is desired, particularly ifthe debug system is being implemented for the first time. The followingfigure shows a less complex embodiment where only event messages arestored.

The second memory segment (420) shows an example where only event-basedmessages (422) are generated and stored. That is, neither timestampmessages nor state capture (i.e., time-based) messages are generated(e.g., by the reporting modules or the aggregation module) in thisexample.

A benefit to embodiments that generate and store only event-basedmessages (as shown in the second memory segment (420)) is that is itmuch simpler for the aggregation module to store messages because thereonly one type (and therefore size) of message that is supported. Theaggregation module does not need to track the number of timestampmessages or state capture state capture (i.e., time-based) messages andinsert unused fourth or half slots where needed. It is also morememory-efficient because there are no unused portions, and the totalamount of memory used is less that the first memory configurationbecause there tend to be many state capture (i.e., time-based) messageswhereas event-based messages tend to be more sparse. For these reasons,in some embodiments, only event-based messages are generated and stored.

The following figure illustrates an example Flash storage controller(implemented on an SOC system) that is analyzed and/or debugged using avisualization system per the techniques described herein. Then, examplescreenshots are described that may be displayed by a visualizationsystem when debugging and/or analyzing the example Flash storagecontroller.

FIG. 5 is a diagram illustrating an embodiment of a Flash storagecontroller that is implemented on a SOC system. The Flash storagecontroller (500) is one example of the SOC system (200) in FIG. 2 thatgenerates hardware-reported information and firmware-reportedinformation, and that is subsequently debugged and/or analyzed by avisualization system (e.g., per FIG. 1 ) using that hardware-reportedinformation and firmware-reported information. To preserve thereadability of the diagram and for ease of explanation, a sampling ofexemplary hardware functional modules and firmware functional modules isshown here; this is not intended to be a complete functional blockdiagram. For example, although only hardware functional modules aredescribed herein, in some embodiments, other embodiments includefirmware functional modules that are configured to report statusinformation.

In this example, the Flash storage controller (500) includes a PCIExpress (PCIe) functional module (502). In some embodiments, the PCIe(502) is a hardware functional module and IP core. For example, a thirdparty may sell the PCIe functional module and purchasers “drop” the IPcore into their SOC designs.

Another functional module in this example is the nonvolatile memory(NVMe) (504). In this example, the NVMe (504) is a hardware functionalmodule that communicates with the host driver to receive host commands(e.g., initiate data fetches or copies to or from the host, etc.) and itis the logical protocol layer over the physical PCIe layer.

The LDPC decoder (506) is a hardware functional module that performserror correction decoding on the data that is stored in the Flashstorage media (508). For example, data stored on the Flash storage media(508) may experience data degradation due to charge leakage. This isespecially true for data that has been stored for a relatively long timeon the Flash storage media (508) and/or when the Flash storage media(508) is worn out and “leaky” (e.g., when the Flash storage media (508)has experienced a relatively large number of program and/or erasecycles). During a read operation, the LDPC decoder (506) may introduce asignificant amount of delay when significantly degraded read data isbeing error corrected because the LDPC decoder must resort to morepowerful decoding techniques which also consuming consume more time.Therefore, when analyzing the performance of the read path, the LDPCdecoder (506) may be of interest since it may add significant delay.

The Flash storage controller (500) also includes Channel 0 NCmdProcessor (510 a)— Channel 3 NCmd Processor (510 d) which are hardwarefunctional modules. These functional modules are the processors thatinterface with the four channels via which commands are received. Thenumber of channels shown here is merely exemplary and for largerthroughput applications there may be more channels.

The following table illustrates some example events reported by thefunctional modules shown in FIG. 5 that are subsequently used to analyzethe performance of the Flash storage controller (500) and/or debug theFlash storage controller (500). For brevity, similar or the same eventsthat relate to multiple functional modules are not necessarily shown.For example, each functional module may report error event(s), which isuseful for debugging, but for brevity Table 1 only shows an error eventfor the PCIe functional module.

TABLE 1 Example events that are reported by the example functionalmodules shown in FIG. 5. Functional Module (Reported) Event EventDescription PCIe Low Power Reports when PCIe low-power modes and/oroperations occur Error Reports when a PCIe error event occurs NVMeReadXferDone End of read operation WriteXferDone End of write operationNVMe Cmd Fetch End of command fetch operation End BAR0 Write When awrite to BAR0 occurs (BAR0 is the memory space in the controller thatthe host writes into to communicate command information) Channel 0 SCMDXfer Done Channel 0 has completed a transfer NCmd Processor Channel 1SCMD Xfer Done Channel 1 has completed a transfer NCmd Processor Channel2 SCMD Xfer Done Channel 2 has completed a transfer NCmd ProcessorChannel 3 SCMD Xfer Done Channel 2 has completed a transfer NCmdProcessor LDPC Decoder Decode Write Data End of error correctiondecoding Xfer End

A benefit to the visualization system described herein is that iteliminates the need for external and/or physical analyzers, which may beexpensive, large, and/or limited in some way. In FIG. 5 , for example, aPCIe analyzer (513) sits between the host (513) and the Flash storagecontroller (500). This side of the Flash storage controller (500) issometimes referred to as the front-end. PCIe analyzers are sometimesused to analyze and/or debug SOC systems that include a PCIe functionalmodule (502). In this example, a visualization system provides and/orsupports operations related to PCI analysis such that a physical PCIeanalyzer (513) is not required. PCIe analyzers are very expensive (e.g.,on the order of hundreds of thousands of dollars) so being able toanalyze PCI-related communications and/or operations without having tobuy a PCIe analyzer is desired. Furthermore, in some instances, accessto the Flash storage controller (500) is difficult and/or there is notenough room around the Flash storage controller (500) to attach a PCIeanalyzer. For example, in an enterprise and/or cloud storage system,there may be many Flash storage controllers packed in tightly together,and there may not be sufficient room to access the Flash storagecontrollers and attach a PCIe analyzer (513).

Another analysis and/or tool that the visualization system may eliminatea need for is a logic analyzer (514) which sits between the Flashstorage controller (500) and the Flash storage media (508). This side ofthe Flash storage controller (500) is sometimes referred to as theback-end. In this example, a visualization system provides and/orsupports operations related to logic analyzer such that a logic analyzer(513) is not required. Logic analyzers (514) run at slower speedscompared to SOC systems; for example, the former may only run in thehundreds of kHz whereas the latter are typically in the MHz range.Therefore, when a logic analyzer is used, the SOC system (e.g., 500)must be slowed down, which in some cases eliminates or hides a bug orperformance issue. In contrast, with the analysis techniques describedherein, the SOC system can be run at their normal operating clockfrequencies which helps to expose or otherwise recreate a bug orperformance issue. Logic analyzers (514) also tend to be memory limited,so that only limited during and/or limited number of signals can becaptured, displayed, and analyzed. With the Flash storage controller(500) application shown in FIG. 5 , the size of the Flash storage media(508) (for example, in which hardware-reported and firmware-reportedinformation is stored before being retrieved by a visualization system)is much larger and exceeds the storage capacity of a logic analyzer(514) by orders of magnitude.

Furthermore, the features offered by PCIe analyzers and logic analyzersmay also be relatively crude and/or incomplete compared to the featuresand/or tools offered by a visualization system (i.e., the visualizationsystem offers features and/or tools that PCIe analyzers and logicanalyzers do not). For example, PCIe analyzers and logic analyzers donot have access to the various intermediate events within the hardwareand firmware functional modules which can be instrumental in identifyingbugs (e.g., determining that a system “hang” occurred because one of thefirmware functional modules and/or hardware functional modules did notproperly signal an end event to end an operation and/or release a sharedresource) and/or improving performance (e.g., having access to die usageinformation reported by the firmware functional modules and/or hardwarefunctional modules to ensure die interleaving is occurring is a(n)(more) efficient manner). The following figures show some exampledisplays presented by a visualization system, some or all of which arenot supported by PCIe analyzers or logic analyzers.

FIG. 6 is a diagram illustrating an embodiment of performance-relatedvisual information associated with read and write operations for a Flashstorage controller. In this example, the display (600) shows latencyinformation for multiple read and write operations performed by theFlash storage controller (500) from FIG. 5 . This display (600) showsone example of visual information that may be displayed by avisualization system at step 106 in FIG. 1 .

At the top of the display are three latency graphs: a left graph (602 a)showing read latencies and write latencies, a center graph (602 b)showing (just) read latencies, and a right graph (602 c) shows (just)write latencies. All of the graphs (602 a-602 c) have time as the x-axisand in this example those x-axis time values are obtained from atimestamp field (e.g., 330 in FIG. 3 ) in an event-based message (e.g.,320 in FIG. 3 ).

In this example, the read latency values (shown in the read and writelatency graph (602 a) and read latency graph (602 b)) are calculated bysubtracting the timestamp from a “ReadXferStart” event-based message(which corresponds to the start of a read operation) generated by theNVMe functional module from the (later) timestamp from the corresponding“ReadXferEnd” event message (which corresponds to the completion of aread operation). Corresponding read event messages are identified byhaving the same value in an appropriate field of the beginning andending event messages. As similar calculation may be performed for writelatencies using “WriteXferStart” and “WriteXferEnd” event-basedmessages.

The read latency graph (602 b) shows that the fastest read latencies arewithin the range of 0-100 us whereas the slowest read latencies arewithin the range of 1,000-1,200 us. The write latency graph (602 c)shows that the fastest write latencies are within the range of 0-25 uswhereas the slowest write latencies are within the range of 300-400 us.By clicking on or selecting one of the slower latencies in one of thelatency graphs (602 a-602 c), the visualization system in responseupdates the event information window (604) to display related eventinformation for the selected read or write latency. This can, forexample, help SOC developers to identify inefficiencies in the read orwrite path.

In this example, the Flash storage controller is already manufactured,so any short-term improvements (e.g., identified by the latency analysisshown in FIG. 5 ) may be implemented by adjusting settings (e.g., inhardware and/or firmware) and/or by updating the firmware running on theSOC. Long-term (e.g., hardware) improvements that are directed tohardcoded inefficiencies may be implemented in (hardware) registertransfer language (RTL) so that the next generation of the SOC system ismanufactured with those (hardware) improvements.

As is shown in this example, in some embodiments, the SoC (e.g.,referred to in FIG. 1 ) includes a Flash storage controller and thevisual information (e.g., displayed at step 106 in FIG. 1 ) includeslatency information.

The bottom window (604) shows events that are reported by the functionalmodules. Each row corresponds to a functional module and the dots ineach row correspond to an event that was reported by that functionalmodule. The following figure shows a zoomed-in view when a first cursor(e.g., Cursor X) and a second cursor (e.g., Cursor Y) are set to a firstand second time, respectively.

FIG. 7 is a diagram illustrating an embodiment of a zoomed-in windowshowing information from event-based messages that are reported by anNVMe functional module in a Flash storage controller. In this example,the display (700) corresponds to a zoomed-in version of the bottomwindow (604) from FIG. 6 .

In this example, the NVMe functional module (702) has been expanded toshow the events WriteXferDone (704 a), ReadXferDone (704 b), and NVMeCmd Fetch End (704 c) that are reported by the NVMe functional module(702). To the right of each event (704 a-704 c) are the fields,contents, and/or payload of the event-based messages associated with therespective events. As shown in this example, WriteXferDone messages (706a) include a timestamp, a ctag (e.g., a type of tag), a TAGID (e.g., atype of identifier), and a SCMDID (e.g., another identifier);ReadXferDone messages (706 b) include a timestamp and a ctag; and NVMeCmd Fetch End messages (706 c) include a timestamp, a ctag, a read/write(R/W) indicator, a TAGID, and an FLBA (e.g., a type of address).

The following figure describes this technique of (e.g., automatically)calculating latency using starting and ending messages more generallyand/or formally in a flowchart.

FIG. 8 is a diagram illustrating an embodiment of a process to generatevisual information using a latency. In some embodiments, generatingvisual information at step 104 in FIG. 1 includes the performing theprocess of FIG. 8 . In one example, the process of FIG. 8 is performedby the visualization system (214) shown in FIG. 2 .

At 800, a starting event-based message and an ending event-based messageare identified based at least in part on a same value for a uniqueoperation identifier in the starting event-based message and an endingevent-based message.

For example, in Table 1 (above), there is a ReadXferDone event message.The NVMe may be configured to generate a related ReadXferStart eventmessage. Both messages may include a field or value that uniquelyidentified corresponding starting and ending event-based messages (e.g.,the tag fields (324) or some part of the payload field (332) in theevent-based message (320) in FIG. 2 have the same value). This may be acommand sequence number, a location of data being processed, a locationof a command, etc. and they will have the same value in the starting andending event-based messages.

At 802, a latency is calculated by subtracting a starting timestamp,included in the starting event-based message, from an ending timestamp,included in the ending event-based message. See, for example, thetimestamp field (330) in the event-based message (320) in FIG. 2 .

At 804, visual information is generated, further based at least in parton the latency. In FIG. 6 , for example, the left graph (602 a) shows aplurality of read and write latencies plotted along the y-axis, thecenter graph (602 b) shows read latencies plotted along the y-axis, andthe right graph (602 c) shows write latencies plotted along the y-axis.The y-axis values of those plotted points are based on the latencyvalues that are calculated.

Latency information is only one type of information that may bedisplayed to debug and/or analyze (e.g., the performance of) an SOCsystem. The following figures show a variety of display embodimentsshowing a variety of visual information. By seeing a visualrepresentation of various metrics (e.g., latency range over time, gapsin successive operations or events in high-level operations, etc.) it iseasier to identify outliers and obtain information about the outliers toidentify which functional modules and/or stages in the system hasunintended delays.

FIG. 9 is a diagram illustrating an embodiment of visual informationshowing gap values associated with a front-end, a middle-end, and aback-end of a Flash storage controller. In this example, the graphs(900-905) are 2D graphs with an x-axis of sequence number and a y-axisof gap values (e.g., between successive events, in units of μs). Eachgraph shows gap values for a different functional module and/ordifferent events for a given functional module.

The top graph (900) shows gap values between successive fetch operationsat the NVMe module (“NVMe Fetch Module Cost” in the graph). Thesecond-from-top graph (901) shows gap values between successive eventsor operations at a firmware front-end virtual queue (“FW FE VQ cost” inthe graph). The third-from-top graph (902) shows gap values betweensuccessive events or operations at a firmware front-end module (“FW_FEModule Cost” in the graph). The third-from-bottom graph (903) shows gapvalues between successive events or operations at a firmware middle-endvirtual queue (“FW_ME VQ Cost” in the graph). The second-from-bottomgraph (904) shows gap values between successive events or operations ata firmware middle-end module (“FW_ME Module Cost” in the graph). Thebottom graph (905) shows gap values between request events or operationsat a firmware back-end virtual queue (“FW_BE_REQ VQ Cost” in the graph).

In one example to illustrate how a gap value may be calculated, thelogged or recorded event information (e.g., event-based messagesgenerated by a functional module) may include sequence numbers or otherrelating or identifying information (e.g., to identify which events arethe successive events of interest) and the timing information (e.g., atimestamp field in the event-based message) may be used to calculate thegaps; from the event and timing information, gap information forsuccessive events may presented as shown here.

FIG. 10 is a diagram illustrating an embodiment of visual informationshowing workload metrics for a Flash storage controller. In thisexample, the top graph (1000) shows, for both reads and writes,input/output operations per second (IOPS) along the left y-axis andcommand size for reads and writes along the right y-axis. The middlegraph (1002) shows read and write throughput. The bottom graph (1004)shows a 3D graph of command size, time, and command count (as the threeaxes) for reads and writes.

FIG. 11 is a diagram illustrating an embodiment of visual informationshowing bus utilization for one channel in a Flash storage controller.In this example, three tables with command statistics are shown at thetop: bus utilization (1100) at left, physical command timing (1102) forchannel 0 at center, and physical command numbers (i.e., count) per 100μs (1104) for channel 0 at right. These metrics are associated with busutilization from a command code point of view. In some applications,this helps to understand whether some commands are issued more than is(e.g., absolutely) necessary to achieve a desired outcome or operation.A chip developer or designer may review the information shown in theseexamples to identify performance improvements (e.g., optimize thenumber, size, and/or sequence of commands to achieve the samefunctionality but more efficiently and/or faster).

For example, the command (in hexadecimal) of 0×78 is issued by the Flashstorage controller to the Flash to read out if the Flash has completed awrite or read operation requested. This 0×78 is sometimes referred toherein as a status check. Sending too many status check commands bringsdown bus utilization. Knowing this overhead allows unnecessary statuscheck commands to be identified and eliminated. See, for example, thecircled row (1106) that shows that status check commands occupy 3.68%bus utilization for the channel in question.

The graphs (1108 and 1110) at the bottom show the gaps between SCmd(e.g., between two successive commands passed from firmware tohardware). The top graph (1108) is a histogram and/or distribution thathas been sorted according to gap values. The bottom graph (1110) is theunsorted version of the information and is ordered by sequence number ortime. In this example, each gap value is calculated by subtracting thetimestamp of a previous SCmd end with the timestamp of a current (i.e.,next) SCmd end.

In this example, the maximum gap (1113 a and 1113 b) is on the order of80 μs whereas the average gap value is 7.45 μs. In an ideal system thereshould be no outliers, particular with that much deviation from theaverage gap value. By identifying and analyzing such outliers,unnecessarily long gaps in firmware processing time may be identifiedand mitigated. Mitigating this gap also leads to higher Flash busutilization. In this example, the displayed information is generatedfrom hardware functional module events but also helps to represent,identify, and/or isolate hardware-firmware interaction delays.

As is shown in this example, in some embodiments, the SoC (e.g.,referred to in FIG. 1 ) includes a Flash storage controller and thevisual information (e.g., displayed at step 106 in FIG. 1 ) includes busutilization information.

FIG. 12 is a diagram illustrating an embodiment of visual informationshowing die access information for a Flash storage controller. In thisexample, the Flash storage media (e.g., 508 in FIG. 5 ) is implementedusing multiple die and therefore accessing the Flash storage mediaincludes selected a die to access and some die sequence access patternsare more efficient compared to others. In general, interleaving orswitching between die within a particular command set is bad and/orundesirable and successive accesses to the same die (e.g., within acommand set) is good and/or desirable.

The visual information presented here permits a developer to quickly andeasily identify any inefficiencies in the die access sequence. In thisexample, there is an isolated access to die index 1 (1200) whichinterrupts the sequence of accesses to die index 0 (1202 a and 1202 b).Identifying such undesirable die access sequences may help a developerunderstand the underlying problem and make changes in the design so thatthis interleaving can be eliminated or at least reduced in frequency.

As is shown in this example, in some embodiments, the SoC (e.g.,referred to in FIG. 1 ) includes a Flash storage controller, the Flashstorage controller is configured to manage access to Flash storage mediathat includes a plurality of die, and the visual information (e.g.,displayed at step 106 in FIG. 1 ) includes a die access sequenceassociated with the plurality of die in the Flash storage media.

FIG. 13 is a diagram illustrating an embodiment of visual informationshowing statistical latency information for a Flash storage controller.In this example, the table at the top (1300) has columns with averagelatency information (1302), columns with maximum latency information(1304), and columns with minimum latency information (1306).

FIG. 14 is a diagram illustrating an embodiment of visual informationshowing bus utilization information for all four channels in a Flashstorage controller. The example of FIG. 14 is similar to that of FIG. 11, except bus utilization, physical command timing, and physical commandnumbers are shown for all four channels (see, e.g., groups 1400, 1402,1404, and 1406) instead of just a single channel. By comparing thisinformation side-by-side, it may be easier to identify deviations and/orunderutilizations between the different channels.

In the above examples, the information presented by the various analysistools and/or operations depends upon certain event-based messages in thefirmware-reported data and/or hardware-reported data. Due to storagelimitations, it may not always be feasible to generate all types ofevent-based messages. The following figure describes an example where anappropriate template is sent out by the visualization system toconfigure an SOC system to generate the proper (e.g., event-based)messages that will permit the visualization system to display a desiredgraph, table, or other type of (e.g., visual) information.

In some embodiments, a visualization system includes various templatesthat describe, for a given visual display or type of information toreport, what event-based messages should be enabled and/or otherwisereported by the various functional modules. For example, to generate thedie (index) access sequence shown in FIG. 12 may take a certain set ofevent-based messages from the functional modules in the SOC while thebus utilization information shown in FIG. 11 may require a different setof event-based messages from the functional modules in the SOC.

FIG. 15 is flowchart illustrating an embodiment of a process to displayvisual information using hardware-reported information andfirmware-reported information generated using a template from avisualization system. In one example, the process of FIG. 15 isperformed by the visualization system (214) in FIG. 2 .

At 1500, a selection of visual information to display is received. Forexample, via a graphical user interface of the visualization system, auser may select one of the displays, graphs, tables, or other examplevisual information shown above to be displayed.

At 1502, a template is obtained based at least in part on the selectionof visual information to display, wherein the template includes one ormore reporting configurations for at least one of: (1) a hardwarefunctional module included in a system on chip (SOC) or (2) a firmwarefunctional module included in the SOC. In some embodiments, thetemplates are pre-generated and selected from some collection of storedtemplates. In some embodiments, a template is generated in real-time.

In the example of FIG. 2 , a template (220) is sent from thevisualization system (214) to the reporting modules (204) in the SOC.The reporting configurations in the template (220) are used to configurethe reporting modules so that the appropriate (e.g., event-based)messages are generated for the display, analysis, and/or debugging thatis desired by a user of the visualization system.

Returning to FIG. 15 , at 1504, reported information is received fromthe SOC, wherein: the reported information includes: (1)hardware-reported information that is reported by the hardwarefunctional module included in the SOC and (2) firmware-reportedinformation that is reported by the firmware functional module includedin the SOC; and the reported information is based at least in part onthe one or more reporting configurations.

At 1506, one or more display settings are received.

At 1508, the visual information is generated based at least in part on:(1) the one or more display settings, (2) the hardware-reportedinformation, and (3) the firmware-reported information. As describedabove, the template ensures that the appropriate (e.g., event-based)messages that are needed for the desired visual information has beengenerated by the SOC.

At 1510, the visual information is displayed. This, for example, is thevisual information that was selected at step 1500.

Although the foregoing embodiments have been described in some detailfor purposes of clarity of understanding, the invention is not limitedto the details provided. There are many alternative ways of implementingthe invention. The disclosed embodiments are illustrative and notrestrictive.

What is claimed is:
 1. A system, comprising: an interface configured to:receive a selection associated with a desired set of visual information,wherein the desired set of visual information is associated with asystem on chip (SOC) that includes a hardware functional module and afirmware functional module; and receive reported information, includingselect hardware-reported information and select firmware-reportedinformation; a processor configured to select, based at least in part onthe selection associated with the desired set of visual information, atemplate from a plurality of available templates, wherein the selectedtemplate is used by the SOC to generate the reported information,including by: configuring the hardware functional module, as prescribedby the selected template, to generate the select hardware-reportedinformation; and configuring the firmware functional module, asprescribed by the selected template, to generate the selectfirmware-reported information; a display configured to display thedesired set of visual information, including by using the selectfirmware-reported information and the select firmware-reportedinformation.
 2. The system recited in claim 1, wherein the hardwarefunctional module includes one or more of the following: a PCI Express(PCIe) functional module, a nonvolatile memory (NVMe) functional moduleassociated with communicating with a host driver to receive hostcommands, or an error correction decoder.
 3. The system recited in claim1, wherein: the reported information includes a timestamped andaggregated message stream; and the timestamped and aggregated messagestream is generated by an aggregation module included in the SOC that isconfigured to: receive hardware event information in a standardizedmessage format from the hardware functional module; receive firmwareevent information in the standardized message format from the firmwarefunctional module; and aggregate and timestamp the hardware eventinformation in the standardized message format and the firmware eventinformation in the standardized message format to obtain the timestampedand aggregated message stream.
 4. The system recited in claim 3, whereintimestamping includes writing a timestamp into a reserved field in thestandardized message format that is reserved for the timestamp.
 5. Thesystem recited in claim 3, wherein: the hardware functional moduleincludes one or more of the following: a PCI Express (PCIe) functionalmodule, a nonvolatile memory (NVMe) functional module associated withcommunicating with a host driver to receive host commands, or an errorcorrection decoder; and the hardware event information in thestandardized message format includes one or more of the following: a lowpower event associated with the PCIe functional module, an errorassociated with the PCIe functional module, an end of read associatedwith the NVMe functional module, an end of write associated with theNVMe functional module, an end of command fetch associated with the NVMefunctional module, a write to a memory space used by the host driver tocommunicate the host commands, or an end of error correction decodingassociated with the error correction decoder.
 6. The system recited inclaim 1, wherein displaying the desired set of visual informationincludes: identifying a starting event-based message and an endingevent-based message based at least in part on a same value for a uniqueoperation identifier in the starting event-based message and the endingevent-based message; calculating a latency by subtracting a startingtimestamp, included in the starting event-based message, from an endingtimestamp, included in the ending event-based message; and generatinglatency-related visual information based at least in part on thelatency, wherein the desired set of visual information includes thelatency-related visual information.
 7. The system recited in claim 1,wherein: the SoC includes a Flash storage controller; and the interfaceis configured to receive the selection associated with the desired setof visual information, including by receiving one or more of thefollowing: a first selection associated with displaying latencyinformation, a second selection associated with displaying busutilization information, or a third selection associated with displayinga die access sequence associated with a plurality of die in Flashstorage media that is managed by the Flash storage controller.
 8. Amethod, comprising: using an interface to receive a selection associatedwith a desired set of visual information, wherein the desired set ofvisual information is associated with a system on chip (SOC) thatincludes a hardware functional module and a firmware functional module;using a processor to select, based at least in part on the selectionassociated with the desired set of visual information, a template from aplurality of available templates, wherein the selected template is usedby the SOC to generate reported information, including by: configuringthe hardware functional module, as prescribed by the selected template,to generate select hardware-reported information; and configuring thefirmware functional module, as prescribed by the selected template, togenerate select firmware-reported information; using the interface toreceive the reported information, including the select hardware-reportedinformation and the select firmware-reported information; and using adisplay to display the desired set of visual information, including byusing the select firmware-reported information and the selectfirmware-reported information.
 9. The method recited in claim 8, whereinthe hardware functional module includes one or more of the following: aPCI Express (PCIe) functional module, a nonvolatile memory (NVMe)functional module associated with communicating with a host driver toreceive host commands, or an error correction decoder.
 10. The methodrecited in claim 8, wherein: the reported information includes atimestamped and aggregated message stream; and the timestamped andaggregated message stream is generated by an aggregation module includedin the SOC that is configured to: receive hardware event information ina standardized message format from the hardware functional module;receive firmware event information in the standardized message formatfrom the firmware functional module; and aggregate and timestamp thehardware event information in the standardized message format and thefirmware event information in the standardized message format to obtainthe timestamped and aggregated message stream.
 11. The method recited inclaim 10, wherein timestamping includes writing a timestamp into areserved field in the standardized message format that is reserved forthe timestamp.
 12. The method recited in claim 10, wherein: the hardwarefunctional module includes one or more of the following: a PCI Express(PCIe) functional module, a nonvolatile memory (NVMe) functional moduleassociated with communicating with a host driver to receive hostcommands, or an error correction decoder; and the hardware eventinformation in the standardized message format includes one or more ofthe following: a low power event associated with the PCIe functionalmodule, an error associated with the PCIe functional module, an end ofread associated with the NVMe functional module, an end of writeassociated with the NVMe functional module, an end of command fetchassociated with the NVMe functional module, a write to a memory spaceused by the host driver to communicate the host commands, or an end oferror correction decoding associated with the error correction decoder.13. The method recited in claim 8, wherein displaying the desired set ofvisual information includes: identifying a starting event-based messageand an ending event-based message based at least in part on a same valuefor a unique operation identifier in the starting event-based messageand the ending event-based message; calculating a latency by subtractinga starting timestamp, included in the starting event-based message, froman ending timestamp, included in the ending event-based message; andgenerating latency-related visual information based at least in part onthe latency, wherein the desired set of visual information includes thelatency-related visual information.
 14. The method recited in claim 8,wherein: the SoC includes a Flash storage controller; and receiving theselection associated with the desired set of visual information includesreceiving one or more of the following: a first selection associatedwith displaying latency information, a second selection associated withdisplaying bus utilization information, or a third selection associatedwith displaying a die access sequence associated with a plurality of diein Flash storage media that is managed by the Flash storage controller.